{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 集合介绍\n",
    "\n",
    "Python中的一个`set`是**独特**且**不可变**（不可更改）对象的集合。\n",
    "\n",
    "要了解一个集合的工作方式，请看下面的代码："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "set()\n"
     ]
    }
   ],
   "source": [
    "# start with an empty set\n",
    "\n",
    "my_set = set()\n",
    "\n",
    "print(my_set)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3, 'b', 'a', 'c'}\n"
     ]
    }
   ],
   "source": [
    "# add a few elements\n",
    "\n",
    "my_set.add('a')\n",
    "my_set.add('b')\n",
    "my_set.add('c')\n",
    "my_set.add(1)\n",
    "my_set.add(2)\n",
    "my_set.add(3)\n",
    "\n",
    "print(my_set)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "2\n",
      "3\n",
      "b\n",
      "a\n",
      "c\n"
     ]
    }
   ],
   "source": [
    "# like a dictionary, a set is UNORDERED. \n",
    "# We can still loop through a set though.\n",
    "\n",
    "for element in my_set:\n",
    "    print(element)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "there are 6 elements in my_set\n"
     ]
    }
   ],
   "source": [
    "# let's see how many elements are in this set...\n",
    "\n",
    "print(\"there are\", len(my_set), \"elements in my_set\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "there are 6 elements in my_set\n"
     ]
    }
   ],
   "source": [
    "# can we make the set bigger by adding more \"copies\"\n",
    "# of existing elements?\n",
    "\n",
    "my_set.add(\"a\")\n",
    "my_set.add(\"a\")\n",
    "my_set.add(\"a\")\n",
    "my_set.add(\"a\")\n",
    "my_set.add(\"a\")\n",
    "my_set.add(\"a\")\n",
    "my_set.add(\"a\")\n",
    "my_set.add(\"a\")\n",
    "\n",
    "print(\"there are\", len(my_set), \"elements in my_set\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3, 'b', 'a', 'c'}\n"
     ]
    }
   ],
   "source": [
    "# there are still only 6 elements...\n",
    "# \n",
    "# that's because sets only care about UNIQUE elements.\n",
    "# They do not allow for multiple \"copies\"\n",
    "\n",
    "print(my_set)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "there are 5 elements in my_set\n",
      "{1, 2, 3, 'b', 'c'}\n"
     ]
    }
   ],
   "source": [
    "# and they haven't changed. What if we remove \"a\"\n",
    "my_set.remove(\"a\")\n",
    "print(\"there are\", len(my_set), \"elements in my_set\")\n",
    "print(my_set)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 集合的力量\n",
    "\n",
    "该`set`受到一组称为[集合论](https://en.wikipedia.org/wiki/Set_theory)的数学分支的启发而产生。\n",
    "\n",
    "集合如此有用的原因在于它们可以让我们利用“文氏图逻辑”。\n",
    "\n",
    "例如，这里有两个集合，一个是包含小于10的质数的`primes`；一个是包含小于10的奇数的`odds`。寻找这两个集合之间关系的方法之一是使用 **文氏图**。\n",
    "\n",
    "![Venn Diagram](https://d17h27t6h515a5.cloudfront.net/topher/2017/November/5a023c16_sets-1/sets-1.png)\n",
    "\n",
    "在此图中，蓝色区域包含奇数，红色包含素数，重叠的紫色区域包含同时属于奇数和素数的数字。\n",
    "\n",
    "## 注意 - 在这里要慢一点哦！\n",
    "\n",
    "接下来的几个单元格展示了一些关于集合的一些**方法**。仔细阅读后，尝试将每种方法与上述的文氏图进行关联。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initializing two sets\n",
    "\n",
    "odds   = set([1,3,5,7,9])\n",
    "primes = set([2,3,5,7])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{3, 5, 7}\n"
     ]
    }
   ],
   "source": [
    "# Demonstration of the \"intersection\" between two sets\n",
    "# The intersection corresponds to the overlapping region\n",
    "# in the Venn Diagram above.\n",
    "\n",
    "odd_AND_prime = odds.intersection(primes)\n",
    "print(odd_AND_prime)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3, 5, 7, 9}\n"
     ]
    }
   ],
   "source": [
    "# Demonstration of the \"union\" of two sets. The union\n",
    "# of sets A and B includes ANY element that is in A OR B \n",
    "# or both.\n",
    "\n",
    "odd_OR_prime = odds.union(primes)\n",
    "print(odd_OR_prime)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 9}\n"
     ]
    }
   ],
   "source": [
    "# Demonstration of the \"set difference\" between two sets.\n",
    "# What do you expect odds-primes to return?\n",
    "\n",
    "odd_not_prime = odds - primes\n",
    "print(odd_not_prime)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{2}\n"
     ]
    }
   ],
   "source": [
    "# Another demo of \"set difference\"\n",
    "\n",
    "prime_not_odd = primes - odds\n",
    "print(prime_not_odd)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 并集 vs 交集\n",
    "\n",
    "两个集合A与B的**并集**包含A*或*B中的元素*或* 包含两者的元素。**交集**包含两者中的元素。\n",
    "\n",
    "![set union vs intersection](https://d17h27t6h515a5.cloudfront.net/topher/2017/November/5a04a22f_sets-2/sets-2.png)\n",
    "\n",
    "### `A-B` vs `B-A`\n",
    "\n",
    "**`A-B`** 包含在A中但不在B中的元素。\n",
    "**`B-A`** 包含B中的元素，但不包含A中的元素。\n",
    "\n",
    "![set_a - set_b](https://d17h27t6h515a5.cloudfront.net/topher/2017/November/5a04a22f_sets-3/sets-3.png)\n",
    "\n",
    "![set_b - set_a](https://d17h27t6h515a5.cloudfront.net/topher/2017/November/5a04a230_sets-4/sets-4.png)\n",
    "\n",
    "## TODO - 练习：A或B，但不能同时包含\n",
    "编写一个函数，将两个集合（`set_a`和`set_b`）作为输入，并返回一个新的集合，其中包含`set_a`或`set_b`中的元素，但**不**包含两者兼有的元素。\n",
    "\n",
    "在上面的文氏图中，除了重叠的中间区域之外，将包括图表中的所有内容。在这个示例中，将是数字9、1和2。\n",
    "\n",
    "注 - 尝试在答案中使用以下所有集合操作：\n",
    "\n",
    "* **`intersection`**\n",
    "* **`union`**\n",
    "* **`difference`**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Nice job! Your function works correctly!\n"
     ]
    }
   ],
   "source": [
    "def a_or_b_but_not_both(set_a, set_b):\n",
    "    \"\"\"Returns a set which contains any element that is \n",
    "    a member of set_a OR a member of set_b but NOT a member\n",
    "    of both.\"\"\"\n",
    "    s = set()\n",
    "    for i in (set_a-set_b):\n",
    "        s.add(i)\n",
    "    for i in (set_b-set_a):\n",
    "        s.add(i)\n",
    "    \n",
    "    return s\n",
    "\n",
    "# testing code\n",
    "assert a_or_b_but_not_both(odds, primes) == set([9,1,2])\n",
    "print(\"Nice job! Your function works correctly!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 1\n",
    "def a_or_b_but_not_both(set_a, set_b):\n",
    "    \"\"\"Returns a set which contains any element that is \n",
    "    a member of set_a OR a member of set_b but NOT a member\n",
    "    of both.\"\"\"\n",
    "    a_and_b = set_a.intersection(set_b)\n",
    "    a_or_b = set_a.union(set_b)\n",
    "    return a_or_b - a_and_b\n",
    "\n",
    "assert a_or_b_but_not_both(odds, primes) == set([9,1,2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 2\n",
    "def a_or_b_but_not_both(set_a, set_b):\n",
    "    \"\"\"Returns a set which contains any element that is \n",
    "    a member of set_a OR a member of set_b but NOT a member\n",
    "    of both.\"\"\"\n",
    "    a_without_b = set_a - set_b\n",
    "    b_without_a = set_b - set_a\n",
    "    return a_without_b.union(b_without_a)\n",
    "\n",
    "assert a_or_b_but_not_both(odds, primes) == set([9,1,2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 3\n",
    "# \n",
    "# There is actually a REALLY succinct way of writing this \n",
    "# function using a single character. I'm not going to tell\n",
    "# you what that is now, but at the end of this notebook \n",
    "# you'll find a link to Python documentation that will..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 太好了，但标签与标记呢？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Nice job! Your function works correctly!\n"
     ]
    }
   ],
   "source": [
    "def consolidate_labels(t1_labels, t2_labels):\n",
    "    \"\"\"\n",
    "    Combines labels from two tickets without duplication.\n",
    "    \n",
    "    Given t1_labels and t2_labels (both lists), return \n",
    "    a consolidated list of labels without duplicates.\n",
    "    \"\"\"\n",
    "    \n",
    "    # TODO - rewrite this function to use sets. You should\n",
    "    #   be able to replace all the code below with 1 or 2\n",
    "    #   lines if you use sets appropriately.\n",
    "    # \n",
    "    # NOTE - to convert a set back to a list, you can\n",
    "    #   use the list() function (demonstrated in the\n",
    "    #   cell below).\n",
    "    \n",
    "    combined = []\n",
    "    for label in t1_labels:\n",
    "        if label not in combined:\n",
    "            combined.append(label)\n",
    "    for label in t2_labels:\n",
    "        if label not in combined:\n",
    "            combined.append(label)\n",
    "    return combined\n",
    "\n",
    "\n",
    "# testing code\n",
    "labels_1 = [\"python\", \"bug\", \"localization\", \"bug\"]\n",
    "labels_2 = [\"planning\", \"localization\"]\n",
    "\n",
    "combined_labels = consolidate_labels(labels_1, labels_2)\n",
    "\n",
    "assert( set(combined_labels) == set([\"python\", \"bug\", \n",
    "                                     \"localization\", \"planning\"]))\n",
    "print(\"Nice job! Your function works correctly!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "odds:         [1, 3, 5, 7, 9]\n",
      "as set:       {1, 3, 5, 7, 9}\n",
      "back to list: [1, 3, 5, 7, 9]\n"
     ]
    }
   ],
   "source": [
    "# demonstration of Python's list() and set() functions\n",
    "\n",
    "odds = [1,3,5,7,9]\n",
    "odds_as_set = set(odds)\n",
    "odds_back_to_list = list(odds_as_set)\n",
    "\n",
    "print(\"odds:        \", odds)\n",
    "print(\"as set:      \", odds_as_set)\n",
    "print(\"back to list:\", odds_back_to_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# andy's solution\n",
    "def consolidate_labels(t1_labels, t2_labels):\n",
    "    \"\"\"\n",
    "    Combines labels from two tickets without duplication.\n",
    "    \n",
    "    Given t1_labels and t2_labels (both lists), return \n",
    "    a consolidated list of labels without duplicates.\n",
    "    \"\"\"\n",
    "    return set(t1_labels).union(set(t2_labels))\n",
    "\n",
    "labels_1 = [\"python\", \"bug\", \"localization\", \"bug\"]\n",
    "labels_2 = [\"planning\", \"localization\"]\n",
    "\n",
    "combined_labels = consolidate_labels(labels_1, labels_2)\n",
    "\n",
    "assert( set(combined_labels) == set([\"python\", \"bug\", \n",
    "                                     \"localization\", \"planning\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 其他集合符号！\n",
    "\n",
    "既然你已经熟悉了各种数据结构，了解如何使用描述这些数据结构的文档，这一点也很重要。\n",
    "\n",
    "查看[关于集合的Python 文档 ](https://docs.python.org/3/tutorial/datastructures.html#sets) ，看一看是否可以找到之前解决“a_or_b_but_not_both”问题的运算符..."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
